Abstract

‘Analyze Boston’ is an open data initiative maintained by the city of Boston containing facts, figures, and maps related to the city. In this project we will look at some of the city’s 133+ data sets and analyze them with descriptive and inferential statistics.The focus of the project is the discovery of interesting patterns.

Intro to the Data

In this analysis, I will be using 4 different data sets.

Employee Earnings Report

  • Each year, the City of Boston publishes payroll data for employees. This dataset contains employee names, job details, and earnings information including base salary, overtime, and total compensation for employees of the City.

  • You can see more at https://data.boston.gov/dataset/employee-earnings-report

knitr::kable(head(earnings_2017), digits = 6, booktabs = TRUE, 
    longtable = TRUE)
NAME DEPARTMENT NAME TITLE REGULAR RETRO OTHER OVERTIME INJURED DETAIL QUINN/EDUCATION INCENTIVE TOTAL EARNINGS POSTAL
Miller,Francis T Boston Police Department Police Lieutenant (Det) 129530.83 NA 13694.50 8150.41 NA NA 24286.86 175662.60 2043
Sullivan,Patrice Murphy Boston Police Department Office Manager 56921.80 NA 3594.65 1548.16 NA NA NA 62064.61 2132
O’Hara,Michael C Boston Police Department Police Lieutenant 124056.64 NA 6432.23 29043.67 NA 22385 23260.69 205178.23 2021
Whalen,Richard L Boston Police Department Police Detective 94956.25 4985.32 13591.93 85419.22 58.03 16330 19971.01 235311.76 2062
Kelly,Ann Marie Boston Police Department Tape Librarian I (BPD) 69995.34 NA 300.00 7961.08 NA NA NA 78256.42 2190
Carroll,William F Boston Police Department Police Offc/Hosp Liaison 4$10 12756.79 2390.14 41612.10 912.04 NA NA 1563.30 59234.37 2131

Crime Incident Reports

  • Crime incident reports are provided by Boston Police Department (BPD) to document the initial details surrounding an incident to which BPD officers respond. This is a dataset containing records from the new crime incident report system, which includes a reduced set of fields focused on capturing the type of incident as well as when and where it occurred. Records in the new system begin in June of 2015.

  • You can see more at https://data.boston.gov/dataset/crime-incident-reports-august-2015-to-date-source-new-system

crime %>% head()
## # A tibble: 6 x 17
##   INCIDENT_NUMBER OFFENSE_CODE OFFENSE_CODE_GRO… OFFENSE_DESCRIP… DISTRICT
##   <chr>           <chr>        <chr>             <chr>            <chr>   
## 1 I182017604      03115        Investigate Pers… INVESTIGATE PER… B3      
## 2 I182017601      00520        Residential Burg… BURGLARY - RESI… B2      
## 3 I182017596      03831        Motor Vehicle Ac… M/V - LEAVING S… A1      
## 4 I182017595      03802        Motor Vehicle Ac… "M/V ACCIDENT -… <NA>    
## 5 I182017594      01830        Drug Violation    DRUGS - SICK AS… D14     
## 6 I182017593      00361        Robbery           ROBBERY - OTHER  D4      
## # ... with 12 more variables: REPORTING_AREA <int>, SHOOTING <chr>,
## #   OCCURRED_ON_DATE <dttm>, YEAR <int>, MONTH <int>, DAY_OF_WEEK <chr>,
## #   HOUR <int>, UCR_PART <chr>, STREET <chr>, Lat <dbl>, Long <dbl>,
## #   Location <chr>

BPD Firearm Recovery Counts

gun_recovery %>% head()
## # A tibble: 6 x 4
##   CollectionDate CrimeGunsRecovered GunsSurrenderedSafe… BuybackGunsRecov…
##   <chr>                       <int>                <int>             <int>
## 1 8/20/2014                       2                    3                 1
## 2 8/21/2014                       2                    0                 4
## 3 8/22/2014                       0                    0                 2
## 4 8/25/2014                       8                    3                 0
## 5 8/26/2014                       9                    0                 0
## 6 8/27/2014                       1                    0                 0

Economic Indicators

  • The Boston Planning and Redevelopment Authority (BPDA), formerly known as the Boston Redevelopment Authority (BRA), is tasked with planning for and guiding inclusive growth within the City of Boston. As part of this mission, BPDA collects and analyzes a variety of economic data relating to topics such as the employment, housing, travel, and real estate development. This is a legacy dataset of economic idicators tracked monthly between January2013 and January 2015.

  • You can see more at https://data.boston.gov/dataset/economic-indicators-legacy-portal

econ_indicators %>% head()
## # A tibble: 6 x 19
##    Year Month logan_passengers logan_intl_flights hotel_occup_rate
##   <int> <int>            <int>              <int>            <dbl>
## 1  2013     1          2019662               2986            0.572
## 2  2013     2          1878731               2587            0.645
## 3  2013     3          2469155               3250            0.819
## 4  2013     4          2551246               3408            0.855
## 5  2013     5          2676291               3240            0.858
## 6  2013     6          2824862               3402            0.911
## # ... with 14 more variables: hotel_avg_daily_rate <dbl>,
## #   total_jobs <int>, unemp_rate <dbl>, labor_force_part_rate <dbl>,
## #   pipeline_unit <int>, pipeline_total_dev_cost <dbl>,
## #   pipeline_sqft <int>, pipeline_const_jobs <dbl>, foreclosure_pet <int>,
## #   foreclosure_deeds <int>, med_housing_price <int>,
## #   housing_sales_vol <int>, new_housing_const_permits <int>,
## #   `new-affordable_housing_permits` <int>

Data Cleaning

The first thing that needs to be done is tidying up the data. We can start by removing any numeric NAs and turning them into 0s.

# check to see if there are any NA entries in non numeric
# columns

sum(is.na(earnings_2017$NAME))
## [1] 0
sum(is.na(earnings_2017$"DEPARTMENT NAME"))
## [1] 0
sum(is.na(earnings_2017$TITLE))
## [1] 0
# we are good!

# set NA to 0 in numeric columns

earnings_2017[is.na(earnings_2017)] <- 0

Since minimum wage is $11/hr we can filter full time from part time. I will be removing those who make under 11 * 40 * 52 = $22880 / year. I will round down to $20,000.

# We can also do some mutating to give us a cleaner data
# frame

earnings_2017 <- earnings_2017 %>% filter(earnings_2017$`TOTAL EARNINGS` > 
    20000) %>% mutate(EXTRA_PAY = .$RETRO + .$OTHER + .$DETAIL + 
    .$`QUINN/EDUCATION INCENTIVE`) %>% select("NAME", "DEPARTMENT NAME", 
    "TITLE", "REGULAR", "OVERTIME", "EXTRA_PAY", "TOTAL EARNINGS", 
    "INJURED")

Now we can begin to clean up those department name factors to get a clearer view of the groups as a whole. It seems like the school system takes up the majority of the factors, so lets compress them all into one factor - “Education”

# We want to combine all of the factors that are within the
# education system into one education factor

# We start with 227 factors

head(factor(unique(earnings_2017$`DEPARTMENT NAME`)))
## [1] Boston Police Department       Workers Compensation Service  
## [3] BPS East Boston High           BPS School Safety Service     
## [5] Dpt of Innovation & Technology BPS Ohrenberger Elementary    
## 224 Levels: Accountability Achievement Gap ... Youth Engagement & Employment
# find all factors containing BPS

earnings_2017_bps_factors <- str_detect(earnings_2017$`DEPARTMENT NAME`, 
    "BPS")

head(unique(earnings_2017$`DEPARTMENT NAME`[earnings_2017_bps_factors]))
## [1] "BPS East Boston High"       "BPS School Safety Service" 
## [3] "BPS Ohrenberger Elementary" "BPS Transportation"        
## [5] "BPS Quincy Elementary"      "BPS McCormack Middle"
# now lets add K-8

earnings_2017_k8_factors <- str_detect(earnings_2017$`DEPARTMENT NAME`, 
    "K-8")

head(unique(earnings_2017$`DEPARTMENT NAME`[earnings_2017_k8_factors]))
## [1] "Roosevelt K-8"       "Edison K-8"          "Higginson/Lewis K-8"
## [4] "Jackson/Mann K-8"    "Greenwood, S K-8"    "Lyon K-8"
# Academy

earnings_2017_Academy_factors <- str_detect(earnings_2017$`DEPARTMENT NAME`, 
    "Academy")

head(unique(earnings_2017$`DEPARTMENT NAME`[earnings_2017_Academy_factors]))
## [1] "Tech Boston Academy"         "BPS Latin Academy"          
## [3] "West Roxbury Academy"        "BPS MPH\\Crafts Academy"    
## [5] "Kennedy, EM Health Academy"  "WREC: Urban Science Academy"
# Others

earnings_2017_otherEdu_factors <- c("Mattapan Early Elementary", 
    "English Language Learn", "Kennedy, JF Elementary", "East Boston EEC", 
    "Baldwin ELC", "Superintendent", "Unified Student Svc", "Umana Middle", 
    "Achievement Gap", "Early Learning Services", "Kennedy, PJ Elementary", 
    "Career & Technical Ed", "HPEC: Com Acd Science & Health", 
    "Henderson Elementary", "Frederick Pilot Middle", "West Zone ELC", 
    "Boston Collaborative High Sch", "Haley Pilot", "Enrollment Services", 
    "Haynes EEC", "Teaching & Learning", "BTU Pilot", "Asst Superintendent-Network A", 
    "Ellison/Parks EES", "Alighieri Montessori School", "Lyon Pilot High 9-12", 
    "Dudley St. Neighborhood School", "P. A. Shaw Elementary", 
    "UP \"Unlocking Potential\" Acad", "Greater Egleston High", 
    "Diplomas Plus", "Quincy Upper School", "Student Support Svc", 
    "Chief Academic Officer")

# mutate

earnings_2017$`DEPARTMENT NAME`[earnings_2017_bps_factors] <- "Education"

head(unique(earnings_2017$`DEPARTMENT NAME`))
## [1] "Boston Police Department"       "Workers Compensation Service"  
## [3] "Education"                      "Dpt of Innovation & Technology"
## [5] "Registry Division"              "Boston Fire Department"
# Down to 127 factors

earnings_2017$`DEPARTMENT NAME`[earnings_2017_k8_factors] <- "Education"

head(unique(earnings_2017$`DEPARTMENT NAME`))
## [1] "Boston Police Department"       "Workers Compensation Service"  
## [3] "Education"                      "Dpt of Innovation & Technology"
## [5] "Registry Division"              "Boston Fire Department"
# Down to 108 factors

earnings_2017$`DEPARTMENT NAME`[earnings_2017_Academy_factors] <- "Education"

head(unique(earnings_2017$`DEPARTMENT NAME`))
## [1] "Boston Police Department"       "Workers Compensation Service"  
## [3] "Education"                      "Dpt of Innovation & Technology"
## [5] "Registry Division"              "Boston Fire Department"
# Down to 97 factors

for (i in earnings_2017_otherEdu_factors) {
    logic <- str_detect(earnings_2017$`DEPARTMENT NAME`, i)
    earnings_2017$`DEPARTMENT NAME`[logic] <- "Education"
}

# Down to 63 factors

head(unique(earnings_2017$`DEPARTMENT NAME`))
## [1] "Boston Police Department"       "Workers Compensation Service"  
## [3] "Education"                      "Dpt of Innovation & Technology"
## [5] "Registry Division"              "Boston Fire Department"

Education Pay

earnings_2017 %>% filter(.$`DEPARTMENT NAME` == "Education") %>% 
    arrange(desc(.$`TOTAL EARNINGS`)) %>% head()
## # A tibble: 6 x 8
##   NAME  `DEPARTMENT NAM… TITLE REGULAR OVERTIME EXTRA_PAY `TOTAL EARNINGS`
##   <chr> <chr>            <chr>   <dbl>    <dbl>     <dbl>            <dbl>
## 1 Bott… Education        BPS …       0        0         0           285459
## 2 Chan… Education        Supe…  264661        0      6000           270661
## 3 McCa… Education        Teac…   46981        0    182189           229170
## 4 Jord… Education        Unit…  106762        0     81789           188551
## 5 Estr… Education        Depu…  177625        0         0           177625
## 6 Wood… Education        Asst…    4620        0    169047           173667
## # ... with 1 more variable: INJURED <dbl>

From the above the pays seem about normal. The first woman Torii Bottomley won a lawsuit against her employer for workplace bullying:

https://www.pacermonitor.com/public/case/22846991/Bottomley_v_Boston_Public_Schools_et_al

The only other outlier that it shown is Elaine M McCabe with a base salary of $46,981 and $182,189.26 in extra pay (not overtime). I was unable to find a reasoning for this.

Police Pay

First lets look at the how the pay is distributed across the police department.

# Find average pay, sd of pay for police officers and plot

bpd_earners <- earnings_2017 %>% filter(.$`DEPARTMENT NAME` == 
    "Boston Police Department")

bpd_earners <- bpd_earners %>% mutate(num_std_devs = (.$`TOTAL EARNINGS` - 
    mean(bpd_earners$`TOTAL EARNINGS`))/sd(bpd_earners$`TOTAL EARNINGS`)) %>% 
    arrange(desc(.$`TOTAL EARNINGS`))

head(bpd_earners)
## # A tibble: 6 x 9
##   NAME  `DEPARTMENT NAM… TITLE REGULAR OVERTIME EXTRA_PAY `TOTAL EARNINGS`
##   <chr> <chr>            <chr>   <dbl>    <dbl>     <dbl>            <dbl>
## 1 Hose… Boston Police D… Poli…  146894    62696    156642           366233
## 2 Kerv… Boston Police D… Poli…  125715    66067    150210           341992
## 3 Lee,… Boston Police D… Poli…   97414    71669    171093           340176
## 4 Hass… Boston Police D… Poli…  137104    72158     99536           320224
## 5 McCo… Boston Police D… Poli…  146894    63708    106072           316674
## 6 Jose… Boston Police D… Poli…   97414    87746    126997           312156
## # ... with 2 more variables: INJURED <dbl>, num_std_devs <dbl>
bpd_2_sd <- bpd_earners %>% ggplot(aes(NAME, num_std_devs, color = (num_std_devs > 
    2))) + geom_point() + ggtitle("Officers who make more than $136071.46/year over the average") + 
    xlab("Officers") + ylab("Number of Standard Deviations") + 
    theme(axis.text.x = element_blank(), legend.title = element_blank())

bpd_3_sd <- bpd_earners %>% ggplot(aes(NAME, num_std_devs, color = (num_std_devs > 
    3))) + geom_point() + ggtitle("Officers who make more than $204107.19 / year over the average") + 
    xlab("Officers") + ylab("Number of Standard Deviations") + 
    theme(axis.text.x = element_blank(), legend.title = element_blank())

bpd_earners %>% filter(bpd_earners$`TOTAL EARNINGS` > 10000) %>% 
    ggplot(aes(.$num_std_devs)) + geom_density() + xlab("Number of Standard Deviations") + 
    ylab("Density") + ggtitle("BPD Pay Distribution") + theme_minimal()

sd(bpd_earners$`TOTAL EARNINGS`)
## [1] 57576.17

We can see from the plot above that our Boston PD officers have pay that is roughly normally distributed around our mean. The graph about shows how many standard deviations officers are away from the mean. The mean itself is $139345.4 per year and 1 standard deviation is $57576.17 per year. Lets now look at our distribution again with an emphasis on our superearners:

gridExtra::grid.arrange(bpd_2_sd, bpd_3_sd)

We can see from above that there are a few people who earn exorbitant sums. What could cause this? The first thing that comes to mind is a high base salary.

bpd_earners %>% arrange(desc(.$REGULAR)) %>% head()
## # A tibble: 6 x 9
##   NAME  `DEPARTMENT NAM… TITLE REGULAR OVERTIME EXTRA_PAY `TOTAL EARNINGS`
##   <chr> <chr>            <chr>   <dbl>    <dbl>     <dbl>            <dbl>
## 1 Evan… Boston Police D… Comm…  230000        0      8846           238846
## 2 Gros… Boston Police D… Supn…  199244        0     26094           225338
## 3 Buck… Boston Police D… Supn…  181983        0     57978           239961
## 4 Manc… Boston Police D… Supn…  181983        0     57978           239961
## 5 Holm… Boston Police D… Supn…  181983        0     47087           229069
## 6 Ridg… Boston Police D… Supn…  180369        0     57726           238094
## # ... with 2 more variables: INJURED <dbl>, num_std_devs <dbl>

We can see from the table above that the highest paid person in the Boston Police Department is the Commissioner with a salary of about $230,000 and a total earnings of about $240,000. Even with this high base salary, he is only 1.68 standard deviations above the average officer pay of $124254.60. In fact, the entire table of top base pay people have standard deviations less than 2, so we can see that base pay isn’t what contributes to such high pay.

The next thing that comes to mind is a lot of overtime mixed with extra pay. This extra pay includes things like road detail and testifying in court. We could also check an assumption that those with a higher paygrade (e.g. captains and lieutenants) are likely to get more overtime money since their time and a half is generally a lot higher.

# create plot that shows overtime paid by base wage
bpd_overtime <- bpd_earners %>% filter(.$REGULAR > 20000) %>% 
    ggplot(aes(.$NAME, .$OVERTIME, color = (.$OVERTIME > .$REGULAR))) + 
    geom_point() + ggtitle("More Overtime Pay than Base Pay") + 
    theme(axis.text.x = element_blank(), legend.title = element_blank()) + 
    xlab("Officers") + ylab("Overtime Paid")

bpd_extra <- bpd_earners %>% filter(.$REGULAR > 20000) %>% ggplot(aes(.$NAME, 
    .$EXTRA_PAY, color = (.$EXTRA_PAY > .$REGULAR))) + geom_point() + 
    ggtitle("More Extra Pay than Base Pay") + theme(axis.text.x = element_blank(), 
    legend.title = element_blank()) + xlab("Officers") + ylab("Extra Paid")

gridExtra::grid.arrange(bpd_overtime, bpd_extra)

From the plots above we can see that there are quite a few people with more of their annual salary coming from overtime or extra pay. This is quite surprising.

If we can assume time and a half, then there are officers working their regular hours and then much more. There are a few who make over $100,000 per year extra through just overtime.

We can also see that there are even more officers who make more than their base salary in extra pay. This extra pay was defined as things like court appearances, detail, retrograde pay and the Quinn education incentive which gives a small salary bump for having a criminal justice degree. Why could this be? After looking into it, I came across some Boston Globe reports. Here is a quote from https://www.bostonglobe.com/metro/2017/06/20/for-some-boston-police-officers-extra-money-comes-easy/oS47lc7OuYyVZbTPBv1zQL/story.html :

Quote: "

In what critics call an extreme example of a systemic problem, Lee Waiman bolstered his wages thanks to police union contracts that require that officers who work detail shifts or testify in court be paid a minimum of four hours, even if the assignment lasts only 30 minutes.

Last year, Lee earned $58,600 by working more than 1,100 hours of overtime, according to a Globe review of police payroll records. Records show that Lee did not work 674 of those hours — more than 16 40-hour weeks — yet received time-and-a-half pay.

Most of Lee’s overtime pay stemmed from court appearances that typically lasted no more than an hour, the Globe found. He was also paid for 2,771 hours for detail shifts, including 861 unworked hours. That allowed him to make close to $130,000, a sum that did not include his overtime pay.

“It’s a generous system,” said Sam Tyler, president of the Boston Municipal Research Bureau, a fiscal watchdog group. “You’re paid for hours you don’t work. It isn’t a new issue, but it’s one that really does need stricter focus and management to control those costs.”

"

Clearly this is a known pattern and has been looked at before. For example, last year the BPD overtime alone hit $66.9 million.

https://www.bostonglobe.com/metro/2018/02/16/bpd-captain-was-city-top-earner/iI4G1pnC7MOUODxo0XR4aO/story.html

Looking back even further, I came across this boston.com article from 2007:

http://archive.boston.com/news/local/articles/2007/08/23/3_police_lieutenants_are_cited_for_alleged_detail_abuses/

Quote: "

The internal audit of shifts worked in 2005 concluded that Lieutenants Haseeb Hosein, Timothy Kervin, and Ghassoub Frangie engaged in untruthful reporting of hours, performed details that conflicted with a scheduled tour of duty, and received details through unauthorized means. Hosein and Kervin were also cited with breaking the law, but officials did not provide details on the alleged infractions.

The investigators accused Hosein, a 19-year veteran, of 203 violations that include 80 counts of inaccurate reporting on a detail card, 16 counts of receiving details outside the system, 24 counts of accepting a detail scheduled during his regular patrol shifts, and one count of breaking the law and conduct unbecoming an officer.

Kervin, a 20-year veteran, was charged with 191 violations that include 68 counts of inaccurate reporting on a detail card, 46 counts of receiving details outside the system, six counts of accepting a detail scheduled during his regular patrol shifts, and one count each of breaking the law and conduct unbecoming of an officer.

Frangie, a 29-year veteran, was charged with 84 violations that include 34 counts of inaccurate reporting on a detail card, 10 counts of accepting a detail scheduled during his regular patrol shifts, three counts of receiving details outside the system, and two counts of conduct unbecoming an officer.

"

# look at the top earning overtimers
bpd_earners %>% arrange(desc(.$OVERTIME)) %>% head(5)
## # A tibble: 5 x 9
##   NAME  `DEPARTMENT NAM… TITLE REGULAR OVERTIME EXTRA_PAY `TOTAL EARNINGS`
##   <chr> <chr>            <chr>   <dbl>    <dbl>     <dbl>            <dbl>
## 1 Sull… Boston Police D… Poli…  110100   144874     28440           283414
## 2 Acos… Boston Police D… Poli…  101681   139806     27517           269004
## 3 Fitz… Boston Police D… Poli…  112121   136404     34853           283378
## 4 Deva… Boston Police D… Poli…  116109   129912     28925           274946
## 5 Hold… Boston Police D… Poli…   95326   127517     32208           255051
## # ... with 2 more variables: INJURED <dbl>, num_std_devs <dbl>
# look at the top earning extra pay people
bpd_earners %>% arrange(desc(.$EXTRA_PAY)) %>% head(5)
## # A tibble: 5 x 9
##   NAME  `DEPARTMENT NAM… TITLE REGULAR OVERTIME EXTRA_PAY `TOTAL EARNINGS`
##   <chr> <chr>            <chr>   <dbl>    <dbl>     <dbl>            <dbl>
## 1 Lee,… Boston Police D… Poli…   97414    71669    171093           340176
## 2 Hose… Boston Police D… Poli…  146894    62696    156642           366233
## 3 Kerv… Boston Police D… Poli…  125715    66067    150210           341992
## 4 Alme… Boston Police D… Poli…   86918    24289    147234           259096
## 5 King… Boston Police D… Poli…  129531    26444    147170           303145
## # ... with 2 more variables: INJURED <dbl>, num_std_devs <dbl>

From the tables above we see some familiar names - including Lee Waiman from our first Boston Globe Article. We also see the names Haseeb Hosein and Timothy Kervin from our 2nd and 3rd articles. Clearly the city government is aware of the problem, but has not stopped it in at least 12 years.

That being said though, police are an important and vital part of society. Lets explore some of their valiant work.

Crime

# look at data set
crime %>% head()
## # A tibble: 6 x 17
##   INCIDENT_NUMBER OFFENSE_CODE OFFENSE_CODE_GRO… OFFENSE_DESCRIP… DISTRICT
##   <chr>           <chr>        <chr>             <chr>            <chr>   
## 1 I182017604      03115        Investigate Pers… INVESTIGATE PER… B3      
## 2 I182017601      00520        Residential Burg… BURGLARY - RESI… B2      
## 3 I182017596      03831        Motor Vehicle Ac… M/V - LEAVING S… A1      
## 4 I182017595      03802        Motor Vehicle Ac… "M/V ACCIDENT -… <NA>    
## 5 I182017594      01830        Drug Violation    DRUGS - SICK AS… D14     
## 6 I182017593      00361        Robbery           ROBBERY - OTHER  D4      
## # ... with 12 more variables: REPORTING_AREA <int>, SHOOTING <chr>,
## #   OCCURRED_ON_DATE <dttm>, YEAR <int>, MONTH <int>, DAY_OF_WEEK <chr>,
## #   HOUR <int>, UCR_PART <chr>, STREET <chr>, Lat <dbl>, Long <dbl>,
## #   Location <chr>
# fix label
crime_by_count <- crime$OFFENSE_CODE_GROUP[crime$OFFENSE_CODE_GROUP == 
    "HUMAN TRAFFICKING - INVOLUNTARY SERVITUDE"] <- "Human Trafficking | Invol Serve"

# show number of each type of crime and proportions
crime_by_count <- crime %>% group_by(.$OFFENSE_CODE_GROUP) %>% 
    summarize(n = n(), prop = n/sum(length(crime$OFFENSE_CODE_GROUP)))

# plot
ggplot(crime_by_count, aes(fct_rev(fct_reorder(crime_by_count$`.$OFFENSE_CODE_GROUP`, 
    crime_by_count$n)), crime_by_count$n)) + geom_bar(aes(fill = crime_by_count$`.$OFFENSE_CODE_GROUP`), 
    stat = "identity") + theme(axis.text.x = element_text(angle = 90, 
    size = 8), legend.position = "none") + ylab("Number of Reports") + 
    xlab("Offense Category") + ggtitle("Counts")

ggplot(crime_by_count, aes(fct_rev(fct_reorder(crime_by_count$`.$OFFENSE_CODE_GROUP`, 
    crime_by_count$prop)), crime_by_count$prop)) + geom_bar(aes(fill = crime_by_count$`.$OFFENSE_CODE_GROUP`), 
    stat = "identity") + theme(axis.text.x = element_text(size = 8, 
    angle = 90), legend.position = "none") + xlab("Offense Category") + 
    ylab("Proportion of Reports") + ggtitle("Proportion")

As we see from above there are quite a few offense categories! Clearly Motor Vehicle Accident Response is the largest, followed by larceny (theft). This data set is quite interesting, so lets look at some more patterns.

Most Common Crimes by Time of Day

The tables below show the top 3 crimes that occur by hour. Understandably, the number 1 is consistently motor vehicle accident response, with consistently around ~10% of all crimes committed within that hour frame over 3 years of data (2015 - 2017). Other interesting bits:

for (i in seq_along(1:23)) {
    cat(sprintf("Hour: %d\n", i))
    crime_hours <- crime %>% filter(crime$HOUR == i) %>% group_by(.$OFFENSE_CODE_GROUP)
    
    crime_hours <- crime_hours %>% summarize(n = n(), prop = n/sum(length(crime_hours$OFFENSE_CODE_GROUP))) %>% 
        arrange(desc(.$n)) %>% head(3)
    print(crime_hours)
    cat("---------------------------------------", "---------------------------------------", 
        sep = "\n\n")
}
## Hour: 1
## # A tibble: 3 x 3
##   `.$OFFENSE_CODE_GROUP`              n   prop
##   <chr>                           <int>  <dbl>
## 1 Motor Vehicle Accident Response   758 0.0962
## 2 Simple Assault                    657 0.0834
## 3 Medical Assistance                600 0.0762
## ---------------------------------------
## 
## ---------------------------------------
## Hour: 2
## # A tibble: 3 x 3
##   `.$OFFENSE_CODE_GROUP`              n   prop
##   <chr>                           <int>  <dbl>
## 1 Motor Vehicle Accident Response   831 0.128 
## 2 Simple Assault                    604 0.0931
## 3 Medical Assistance                486 0.0749
## ---------------------------------------
## 
## ---------------------------------------
## Hour: 3
## # A tibble: 3 x 3
##   `.$OFFENSE_CODE_GROUP`              n   prop
##   <chr>                           <int>  <dbl>
## 1 Motor Vehicle Accident Response   529 0.136 
## 2 Medical Assistance                336 0.0861
## 3 Simple Assault                    241 0.0618
## ---------------------------------------
## 
## ---------------------------------------
## Hour: 4
## # A tibble: 3 x 3
##   `.$OFFENSE_CODE_GROUP`              n   prop
##   <chr>                           <int>  <dbl>
## 1 Motor Vehicle Accident Response   340 0.120 
## 2 Medical Assistance                284 0.0999
## 3 Vandalism                         182 0.0640
## ---------------------------------------
## 
## ---------------------------------------
## Hour: 5
## # A tibble: 3 x 3
##   `.$OFFENSE_CODE_GROUP`              n   prop
##   <chr>                           <int>  <dbl>
## 1 Motor Vehicle Accident Response   437 0.158 
## 2 Medical Assistance                299 0.108 
## 3 Vandalism                         190 0.0688
## ---------------------------------------
## 
## ---------------------------------------
## Hour: 6
## # A tibble: 3 x 3
##   `.$OFFENSE_CODE_GROUP`              n   prop
##   <chr>                           <int>  <dbl>
## 1 Motor Vehicle Accident Response   739 0.175 
## 2 Medical Assistance                349 0.0826
## 3 Vandalism                         265 0.0627
## ---------------------------------------
## 
## ---------------------------------------
## Hour: 7
## # A tibble: 3 x 3
##   `.$OFFENSE_CODE_GROUP`              n   prop
##   <chr>                           <int>  <dbl>
## 1 Motor Vehicle Accident Response  1226 0.164 
## 2 Towed                             954 0.128 
## 3 Medical Assistance                477 0.0639
## ---------------------------------------
## 
## ---------------------------------------
## Hour: 8
## # A tibble: 3 x 3
##   `.$OFFENSE_CODE_GROUP`              n   prop
##   <chr>                           <int>  <dbl>
## 1 Motor Vehicle Accident Response  1566 0.143 
## 2 Towed                            1177 0.107 
## 3 Larceny                           746 0.0681
## ---------------------------------------
## 
## ---------------------------------------
## Hour: 9
## # A tibble: 3 x 3
##   `.$OFFENSE_CODE_GROUP`              n   prop
##   <chr>                           <int>  <dbl>
## 1 Motor Vehicle Accident Response  1462 0.118 
## 2 Towed                            1068 0.0859
## 3 Larceny                           874 0.0703
## ---------------------------------------
## 
## ---------------------------------------
## Hour: 10
## # A tibble: 3 x 3
##   `.$OFFENSE_CODE_GROUP`              n   prop
##   <chr>                           <int>  <dbl>
## 1 Motor Vehicle Accident Response  1435 0.105 
## 2 Larceny                          1219 0.0892
## 3 Medical Assistance               1078 0.0789
## ---------------------------------------
## 
## ---------------------------------------
## Hour: 11
## # A tibble: 3 x 3
##   `.$OFFENSE_CODE_GROUP`              n   prop
##   <chr>                           <int>  <dbl>
## 1 Motor Vehicle Accident Response  1431 0.104 
## 2 Larceny                          1275 0.0927
## 3 Medical Assistance               1067 0.0776
## ---------------------------------------
## 
## ---------------------------------------
## Hour: 12
## # A tibble: 3 x 3
##   `.$OFFENSE_CODE_GROUP`              n   prop
##   <chr>                           <int>  <dbl>
## 1 Larceny                          1702 0.109 
## 2 Motor Vehicle Accident Response  1504 0.0961
## 3 Medical Assistance               1116 0.0713
## ---------------------------------------
## 
## ---------------------------------------
## Hour: 13
## # A tibble: 3 x 3
##   `.$OFFENSE_CODE_GROUP`              n   prop
##   <chr>                           <int>  <dbl>
## 1 Motor Vehicle Accident Response  1562 0.110 
## 2 Larceny                          1449 0.102 
## 3 Medical Assistance               1100 0.0777
## ---------------------------------------
## 
## ---------------------------------------
## Hour: 14
## # A tibble: 3 x 3
##   `.$OFFENSE_CODE_GROUP`              n   prop
##   <chr>                           <int>  <dbl>
## 1 Motor Vehicle Accident Response  1635 0.114 
## 2 Larceny                          1614 0.113 
## 3 Medical Assistance               1073 0.0749
## ---------------------------------------
## 
## ---------------------------------------
## Hour: 15
## # A tibble: 3 x 3
##   `.$OFFENSE_CODE_GROUP`              n   prop
##   <chr>                           <int>  <dbl>
## 1 Motor Vehicle Accident Response  1763 0.127 
## 2 Larceny                          1542 0.111 
## 3 Medical Assistance               1017 0.0730
## ---------------------------------------
## 
## ---------------------------------------
## Hour: 16
## # A tibble: 3 x 3
##   `.$OFFENSE_CODE_GROUP`              n   prop
##   <chr>                           <int>  <dbl>
## 1 Motor Vehicle Accident Response  2061 0.122 
## 2 Larceny                          1666 0.0989
## 3 Drug Violation                   1471 0.0873
## ---------------------------------------
## 
## ---------------------------------------
## Hour: 17
## # A tibble: 3 x 3
##   `.$OFFENSE_CODE_GROUP`              n   prop
##   <chr>                           <int>  <dbl>
## 1 Motor Vehicle Accident Response  2204 0.126 
## 2 Drug Violation                   1878 0.107 
## 3 Larceny                          1648 0.0938
## ---------------------------------------
## 
## ---------------------------------------
## Hour: 18
## # A tibble: 3 x 3
##   `.$OFFENSE_CODE_GROUP`              n   prop
##   <chr>                           <int>  <dbl>
## 1 Motor Vehicle Accident Response  1943 0.114 
## 2 Drug Violation                   1752 0.102 
## 3 Larceny                          1554 0.0908
## ---------------------------------------
## 
## ---------------------------------------
## Hour: 19
## # A tibble: 3 x 3
##   `.$OFFENSE_CODE_GROUP`              n   prop
##   <chr>                           <int>  <dbl>
## 1 Motor Vehicle Accident Response  1655 0.111 
## 2 Larceny                          1297 0.0873
## 3 Drug Violation                   1269 0.0854
## ---------------------------------------
## 
## ---------------------------------------
## Hour: 20
## # A tibble: 3 x 3
##   `.$OFFENSE_CODE_GROUP`              n   prop
##   <chr>                           <int>  <dbl>
## 1 Motor Vehicle Accident Response  1451 0.109 
## 2 Larceny                          1078 0.0810
## 3 Medical Assistance                996 0.0749
## ---------------------------------------
## 
## ---------------------------------------
## Hour: 21
## # A tibble: 3 x 3
##   `.$OFFENSE_CODE_GROUP`              n   prop
##   <chr>                           <int>  <dbl>
## 1 Motor Vehicle Accident Response  1341 0.113 
## 2 Medical Assistance                977 0.0825
## 3 Larceny                           784 0.0662
## ---------------------------------------
## 
## ---------------------------------------
## Hour: 22
## # A tibble: 3 x 3
##   `.$OFFENSE_CODE_GROUP`              n   prop
##   <chr>                           <int>  <dbl>
## 1 Motor Vehicle Accident Response  1342 0.123 
## 2 Medical Assistance                860 0.0790
## 3 Vandalism                         768 0.0705
## ---------------------------------------
## 
## ---------------------------------------
## Hour: 23
## # A tibble: 3 x 3
##   `.$OFFENSE_CODE_GROUP`              n   prop
##   <chr>                           <int>  <dbl>
## 1 Motor Vehicle Accident Response  1075 0.121 
## 2 Vandalism                         657 0.0737
## 3 Medical Assistance                641 0.0719
## ---------------------------------------
## 
## ---------------------------------------

Shooting

Lets take a look at the crimes involving shooting. The shooting column of the Crime Incidents Reports data indicates that a shooting took place. Lets take a look at the This data is from 2015 - 2018.

crime_shooting <- crime %>% filter(.$SHOOTING == "Y") %>% group_by(.$OFFENSE_CODE_GROUP)

crime_shooting <- crime_shooting %>% summarize(n = n(), prop = n/length(crime_shooting$OFFENSE_CODE_GROUP)) %>% 
    arrange(desc(n))

crime_shooting <- crime_shooting %>% mutate(agg = cumsum(.$prop))

crime_shooting %>% ggplot() + geom_bar(aes(fct_reorder(crime_shooting$`.$OFFENSE_CODE_GROUP`, 
    crime_shooting$n), crime_shooting$prop, fill = crime_shooting$`.$OFFENSE_CODE_GROUP`), 
    stat = "identity") + coord_flip() + theme(legend.position = "none") + 
    xlab("Offense Type") + ylab("Proportion") + ggtitle("Crimes Involving Shooting")

As we can see from the table above, over 50% of crimes involving shooting were aggravated assaults. At around 10% of shooting crimes, there were 102 homicides. This is a pretty low number for 3 years in a major city - good job BPD!

Lets take a look at how the number of murders by shooting has changed over the years 2015 - 2018.

# filter crime by homicide, get counts and plot
murders <- crime %>% filter(.$OFFENSE_CODE_GROUP == "Homicide") %>% 
    filter(.$SHOOTING == "Y") %>% group_by(.$YEAR) %>% mutate(n = n()) %>% 
    ggplot() + geom_line(aes(.$YEAR, n)) + theme_minimal() + 
    xlab("Year") + ylab("Number of Crimes") + ggtitle("Murder Counts")

# filter crime by shooting, get counts and plot
overall_shooting <- crime %>% filter(.$SHOOTING == "Y") %>% group_by(.$YEAR) %>% 
    mutate(n = n()) %>% ggplot() + geom_line(aes(.$YEAR, n)) + 
    theme_minimal() + xlab("Year") + ylab("Number of Crimes") + 
    ggtitle("Shooting Crimes")

# filter crimes by aggravated assault with shooting, get
# counts and plot
agg_assaults <- crime %>% filter(.$OFFENSE_CODE_GROUP == "Aggravated Assault") %>% 
    filter(.$SHOOTING == "Y") %>% group_by(.$YEAR) %>% mutate(n = n()) %>% 
    ggplot() + geom_line(aes(.$YEAR, n)) + theme_minimal() + 
    xlab("Year") + ylab("Number of Crimes") + ggtitle("Aggravated Assaults")

# look at gun recovery data
gun_recovery %>% head()
## # A tibble: 6 x 4
##   CollectionDate CrimeGunsRecovered GunsSurrenderedSafe… BuybackGunsRecov…
##   <chr>                       <int>                <int>             <int>
## 1 8/20/2014                       2                    3                 1
## 2 8/21/2014                       2                    0                 4
## 3 8/22/2014                       0                    0                 2
## 4 8/25/2014                       8                    3                 0
## 5 8/26/2014                       9                    0                 0
## 6 8/27/2014                       1                    0                 0
# add features for year and month to data frame
gun_recovery <- gun_recovery %>% mutate(Year = year(mdy(.$CollectionDate)), 
    Month = month(mdy(.$CollectionDate)))

# create plot
gun_crimes <- gun_recovery %>% ggplot() + geom_jitter(aes(gun_recovery$Year, 
    gun_recovery$CrimeGunsRecovered)) + theme_minimal() + xlab("Year") + 
    ylab("Guns Recovered") + ggtitle("Guns Recovered from Crimes")

# plot volunatary surrender of guns
vol_surr <- gun_recovery %>% ggplot() + geom_jitter(aes(gun_recovery$Year, 
    gun_recovery$GunsSurrenderedSafeguarded)) + theme_minimal() + 
    xlab("Year") + ylab("Number of Guns") + ggtitle("Number of Guns Voluntarily Surrendered") + 
    scale_x_continuous(breaks = c(2014, 2015, 2016, 2017))

# plot results of gun buyback program
gun_buyback <- gun_recovery %>% ggplot() + geom_jitter(aes(gun_recovery$Year, 
    gun_recovery$BuybackGunsRecovered)) + theme_minimal() + xlab("Year") + 
    ylab("Number of Guns Bought") + ggtitle("Gun Buyback Program")


# plot
gridExtra::grid.arrange(overall_shooting, murders, agg_assaults, 
    nrow = 1)

gridExtra::grid.arrange(gun_crimes, vol_surr, gun_buyback, nrow = 3)
## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

From the first set of graphs we see the following:

From the second set of graphs related to gun recovery we see the following:

Economic Indicators

Lets take a look at the Economic Indicators Dataset. This set contains information on

econ_indicators %>% head()
## # A tibble: 6 x 19
##    Year Month logan_passengers logan_intl_flights hotel_occup_rate
##   <int> <int>            <int>              <int>            <dbl>
## 1  2013     1          2019662               2986            0.572
## 2  2013     2          1878731               2587            0.645
## 3  2013     3          2469155               3250            0.819
## 4  2013     4          2551246               3408            0.855
## 5  2013     5          2676291               3240            0.858
## 6  2013     6          2824862               3402            0.911
## # ... with 14 more variables: hotel_avg_daily_rate <dbl>,
## #   total_jobs <int>, unemp_rate <dbl>, labor_force_part_rate <dbl>,
## #   pipeline_unit <int>, pipeline_total_dev_cost <dbl>,
## #   pipeline_sqft <int>, pipeline_const_jobs <dbl>, foreclosure_pet <int>,
## #   foreclosure_deeds <int>, med_housing_price <int>,
## #   housing_sales_vol <int>, new_housing_const_permits <int>,
## #   `new-affordable_housing_permits` <int>
names(econ_indicators)
##  [1] "Year"                           "Month"                         
##  [3] "logan_passengers"               "logan_intl_flights"            
##  [5] "hotel_occup_rate"               "hotel_avg_daily_rate"          
##  [7] "total_jobs"                     "unemp_rate"                    
##  [9] "labor_force_part_rate"          "pipeline_unit"                 
## [11] "pipeline_total_dev_cost"        "pipeline_sqft"                 
## [13] "pipeline_const_jobs"            "foreclosure_pet"               
## [15] "foreclosure_deeds"              "med_housing_price"             
## [17] "housing_sales_vol"              "new_housing_const_permits"     
## [19] "new-affordable_housing_permits"
unique(econ_indicators$total_jobs)
## [1] 0

Flights / Tourism

logan_pass <- econ_indicators %>% filter(.$logan_passengers > 
    0) %>% ggplot() + geom_point(aes(.$Month, .$logan_passengers)) + 
    geom_smooth(aes(.$Month, .$logan_passengers, group = .$Year, 
        color = .$Year)) + xlab("Month") + ylab("Passengers") + 
    ggtitle("Logan Airport Passengers per Month") + scale_x_continuous(breaks = c(1, 
    2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12))

logan_intl_pass <- econ_indicators %>% filter(.$logan_intl_flights > 
    0) %>% ggplot() + geom_point(aes(.$Month, .$logan_intl_flights)) + 
    geom_smooth(aes(.$Month, .$logan_intl_flights, group = .$Year, 
        color = .$Year)) + xlab("Month") + ylab("Passengers") + 
    ggtitle("Logan Airport International Passengers per Month") + 
    scale_x_continuous(breaks = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 
        10, 11, 12))

gridExtra::grid.arrange(logan_pass, logan_intl_pass)
## `geom_smooth()` using method = 'loess'
## `geom_smooth()` using method = 'loess'

From the graphs above we see the following trends:

Hotels

hotel_occ <- econ_indicators %>% filter(.$hotel_occup_rate > 
    0) %>% ggplot() + geom_point(aes(.$Month, .$hotel_occup_rate)) + 
    geom_smooth(aes(.$Month, .$hotel_occup_rate, group = .$Year, 
        color = .$Year)) + xlab("Month") + ylab("Percent of Hotel Rooms Occupied") + 
    ggtitle("Hotel Occupancy Rates by Month") + scale_x_continuous(breaks = c(1, 
    2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12))

hotel_prices <- econ_indicators %>% filter(.$hotel_avg_daily_rate > 
    0) %>% ggplot() + geom_point(aes(.$Month, .$hotel_avg_daily_rate)) + 
    geom_smooth(aes(.$Month, .$hotel_avg_daily_rate, group = .$Year, 
        color = .$Year)) + xlab("Month") + ylab("Hotel Average Daily Price") + 
    ggtitle("Hotel Prices by Month") + scale_x_continuous(breaks = c(1, 
    2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12))

gridExtra::grid.arrange(hotel_occ, hotel_prices)
## `geom_smooth()` using method = 'loess'
## `geom_smooth()` using method = 'loess'

From the graphs above we see the following trends:

Job Market

unemp_rate <- econ_indicators %>% filter(.$unemp_rate > 0.001) %>% 
    ggplot() + geom_point(aes(.$Month, .$unemp_rate)) + geom_smooth(aes(.$Month, 
    .$unemp_rate, group = .$Year, color = .$Year)) + xlab("Month") + 
    ylab("Unemployment Rate") + ggtitle("Unemployment Rate by Month") + 
    scale_x_continuous(breaks = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 
        10, 11, 12))

part_rate <- econ_indicators %>% filter(.$labor_force_part_rate > 
    0.1) %>% ggplot() + geom_point(aes(.$Month, .$labor_force_part_rate)) + 
    geom_smooth(aes(.$Month, .$labor_force_part_rate, group = .$Year, 
        color = .$Year)) + xlab("Month") + ylab("Participation Rate") + 
    ggtitle("Employment Participation Rate by Month") + scale_x_continuous(breaks = c(1, 
    2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12))

gridExtra::grid.arrange(unemp_rate, part_rate)
## `geom_smooth()` using method = 'loess'
## `geom_smooth()` using method = 'loess'

The graphs above show the following trends:

Real Estate Board Approved Development Projects

units_approved <- econ_indicators %>% filter(.$pipeline_unit > 
    0) %>% group_by(.$Year) %>% ggplot() + geom_bar(aes(.$Month, 
    .$pipeline_unit, fill = .$Year), position = "stack", stat = "identity") + 
    xlab("Month") + ylab("Units Approved") + ggtitle("Real Estate Units Approved") + 
    theme(legend.position = "none") + scale_x_continuous(breaks = c(1, 
    2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12))

cost_of_devs <- econ_indicators %>% filter(.$pipeline_total_dev_cost > 
    0) %>% group_by(.$Year) %>% ggplot() + geom_bar(aes(.$Month, 
    .$pipeline_total_dev_cost, fill = .$Year), position = "stack", 
    stat = "identity") + xlab("Month") + ylab("Cost of Developments") + 
    ggtitle("Total Cost of Developments") + theme(legend.position = "none") + 
    scale_x_continuous(breaks = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 
        10, 11, 12))

tot_sqft_devs <- econ_indicators %>% filter(.$pipeline_sqft > 
    0) %>% group_by(.$Year) %>% ggplot() + geom_bar(aes(.$Month, 
    .$pipeline_sqft, fill = .$Year), position = "stack", stat = "identity") + 
    xlab("Month") + ylab("Sq Ft of Units Approved") + ggtitle("Total Squarefeet of Units Approved") + 
    theme(legend.position = "none") + scale_x_continuous(breaks = c(1, 
    2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12))

const_jobs <- econ_indicators %>% filter(.$pipeline_const_jobs > 
    0) %>% group_by(.$Year) %>% ggplot() + geom_bar(aes(.$Month, 
    .$pipeline_const_jobs, fill = .$Year), position = "stack", 
    stat = "identity") + xlab("Month") + ylab("Number of Jobs") + 
    ggtitle("Construction Jobs Created") + guides(fill = guide_legend(keywidth = 1, 
    keyheight = 1, title = "Year", label.position = "bottom")) + 
    theme(legend.text = element_text(size = 5)) + scale_x_continuous(breaks = c(1, 
    2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12))

gridExtra::grid.arrange(units_approved, cost_of_devs, tot_sqft_devs, 
    const_jobs)

The graphs above show the following trends:

Housing

forc_pets <- econ_indicators %>% filter(.$foreclosure_pet > 0) %>% 
    group_by(.$Year) %>% ggplot() + geom_bar(aes(.$Month, .$foreclosure_pet, 
    fill = .$Year), position = "stack", stat = "identity") + 
    xlab("Month") + ylab("Number of Foreclosure Petitions") + 
    ggtitle("Foreclosure Petitions") + theme(legend.position = "none") + 
    scale_x_continuous(breaks = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 
        10, 11, 12))


forc_deeds <- econ_indicators %>% filter(.$foreclosure_deeds > 
    0) %>% group_by(.$Year) %>% ggplot() + geom_bar(aes(.$Month, 
    .$foreclosure_deeds, fill = .$Year), position = "stack", 
    stat = "identity") + xlab("Month") + ylab("Number of Foreclosure Deeds") + 
    ggtitle("Foreclosure Deeds") + guides(fill = guide_legend(keywidth = 1, 
    keyheight = 1, title = "Year", label.position = "bottom")) + 
    theme(legend.text = element_text(size = 5)) + scale_x_continuous(breaks = c(1, 
    2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12))

housing_prices <- econ_indicators %>% filter(.$med_housing_price > 
    0) %>% group_by(.$Year) %>% ggplot() + geom_smooth(aes(.$Month, 
    .$med_housing_price, color = .$Year, group = .$Year), se = FALSE) + 
    xlab("Month") + ylab("Median Housing Price") + ggtitle("Median Housing Sales Price") + 
    guides(fill = guide_legend(keywidth = 1, keyheight = 1, title = "Year", 
        label.position = "bottom")) + theme(legend.text = element_text(size = 5)) + 
    scale_x_continuous(breaks = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 
        10, 11, 12))

housing_sales <- econ_indicators %>% filter(.$housing_sales_vol > 
    0) %>% group_by(.$Year) %>% ggplot() + geom_smooth(aes(.$Month, 
    .$housing_sales_vol, color = .$Year, group = .$Year), se = FALSE) + 
    xlab("Month") + ylab("Number of Houses Sold") + ggtitle("Housing Sales Volume by Year and Month") + 
    theme(legend.position = "none") + scale_x_continuous(breaks = c(1, 
    2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12))

housing_const_permits <- econ_indicators %>% filter(.$new_housing_const_permits > 
    0) %>% group_by(.$Year) %>% ggplot() + geom_bar(aes(.$Month, 
    .$new_housing_const_permits, fill = .$Year), position = "stack", 
    stat = "identity") + xlab("Month") + ylab("Number of Housing Construction Permits") + 
    ggtitle("Housing Construction") + theme(legend.position = "none") + 
    scale_x_continuous(breaks = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 
        10, 11, 12))

affordable_permits <- econ_indicators %>% filter(.$`new-affordable_housing_permits` > 
    0) %>% group_by(.$Year) %>% ggplot() + geom_bar(aes(.$Month, 
    .$`new-affordable_housing_permits`, fill = .$Year), position = "stack", 
    stat = "identity") + xlab("Month") + ylab("Number of Affordable Housing Construction Permits") + 
    ggtitle("Affordable Housing Construction") + guides(fill = guide_legend(keywidth = 1, 
    keyheight = 1, title = "Year", label.position = "bottom")) + 
    theme(legend.text = element_text(size = 5)) + scale_x_continuous(breaks = c(1, 
    2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12))

gridExtra::grid.arrange(housing_sales, housing_prices, nrow = 2)
## `geom_smooth()` using method = 'loess'
## `geom_smooth()` using method = 'loess'

gridExtra::grid.arrange(housing_const_permits, affordable_permits, 
    nrow = 1)

From the above graphs we can see the following trends:

gridExtra::grid.arrange(forc_pets, forc_deeds, nrow = 1)

A foreclosure petition is when a lender (typically a bank) sues a delinquent tenant by filing a court document for foreclosure. This petition for foreclosure is then delivered to the homeowner along with a court summons.

A foreclosure deed is when a lender accepts the deed (document stating ownership) of a property instead of foreclosing on a house.

The graphs above show the following trends:

Environment

Conclusion